Context-based Term Disambiguation in Biomedical Literature

نویسندگان

  • Ping Chen
  • Hisham Al-Mubaid
چکیده

The huge volumes of unstructured texts available online drives the increasing need for automated techniques to analyze and extract knowledge from these repositories of information. Resolving the ambiguity in these texts is an important step for any following analysis tasks. In this paper, we present a new method for one type of ambiguity resolving -term disambiguation. The method is based on machine learning and can be viewed as a context-based classification approach. In our experiments we apply it to gene and protein name disambiguation. We have extensively evaluated our method using around 600,000 Medline abstracts and three different classifiers. The results show that our technique is effective in achieving impressive accuracy, precision, and recall rates, and outperforms the recently published results on this problem. The paper includes the details of the method and the experimental design. We plan to apply our technique to the general domain of word sense disambiguation in the future.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation

Word sense disambiguation helps identifying the proper sense of ambiguous words in text. With large terminologies such as the UMLS Metathesaurus ambiguities appear and highly effective disambiguation methods are required. Supervised learning algorithm methods are used as one of the approaches to perform disambiguation. Features extracted from the context of an ambiguous word are used to identif...

متن کامل

Sense-Based Biomedical Indexing and Retrieval

This paper tackles the problem of term ambiguity, especially for biomedical literature. We propose and evaluate two methods of Word Sense Disambiguation (WSD) for biomedical terms and integrate them to a sense-based document indexing and retrieval framework. Ambiguous biomedical terms in documents and queries are disambiguated using the Medical Subject Headings (MeSH) thesaurus and semantically...

متن کامل

Biomedical Word Sense Disambiguation with Neural Word and Concept Embeddings

OF THESIS Biomedical Word Sense Disambiguation with Neural Word and Concept Embeddings Addressing ambiguity issues is an important step in natural language processing (NLP) pipelines designed for information extraction and knowledge discovery. This problem is also common in biomedicine where NLP applications have become indispensable to exploit latent information from biomedical literature and ...

متن کامل

A Learning-Based Approach for Biomedical Word Sense Disambiguation

In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated insta...

متن کامل

Semantic Relatedness for Biomedical Word Sense Disambiguation

This paper presents a graph-based method for all-word word sense disambiguation of biomedical texts using semantic relatedness as edge weight. Semantic relatedness is derived from a term-topic co-occurrence matrix. The sense inventory is generated by the MetaMap program. Word sense disambiguation is performed on a disambiguation graph via a vertex centrality measure. The proposed method achieve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006